AITopics | regret-optimal model-free reinforcement learning

Collaborating Authors

regret-optimal model-free reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Neural Information Processing SystemsDec-27-2025, 07:52:33 GMT

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.

name change, regret-optimal model-free reinforcement learning, short burn-in time, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Neural Information Processing SystemsJan-20-2025, 03:36:33 GMT

model-free algorithm, regret-optimal model-free reinforcement learning, short burn-in time, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Neural Information Processing SystemsJan-17-2025, 14:19:09 GMT

Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with S states, A actions and horizon length H, substantial progress has been achieved towards characterizing the minimax-optimal regret, which scales on the order of \sqrt{H 2SAT} (modulo log factors) with T the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g., S 6A 4 \,\mathrm{poly}(H) for existing model-free methods).To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity O(SAH), that achieves near-optimal regret as soon as the sample size exceeds the order of SA\,\mathrm{poly}(H) . In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves --- by at least a factor of S 5A 3 --- upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called {\em reference-advantage decomposition}), the proposed algorithm employs an {\em early-settled} reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds.

artificial intelligence, machine learning, regret-optimal model-free reinforcement learning, (5 more...)

Neural Information Processing Systems

Industry: Energy > Oil & Gas > Upstream (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback